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Signal Shaping for BICM at Low SNR 

Erik Agrell and Alex Alvarado 



Abstract — The mutual information of bit-interleaved coded 
modulation (BICM) systems, sometimes called the BICM ca- 
pacity, is investigated at low signal-to-noise ratio (SNR), i.e., in 
the wideband regime. A new linear transform that depends on 
bits' probabilities is introduced. This transform is used to prove 
the asymptotical equivalence between certain BICM systems 
with uniform and nonuniform input distributions. Using known 
results for BICM systems with a uniform input distribution, 
we completely characterize the combinations of input alphabet, 
input distribution, and binary labeling that achieve the Shannon 
limit -1.59 dB. The main conclusion is that a BICM system 
achieves the Shannon limit at low SNR if and only if it can 
be represented as a zero-mean linear projection of a hypercube, 
which is the same condition as for uniform input distributions. 
Hence, probabilistic shaping offers no extra degrees of freedom 
to optimize the low-SNR mutual information of BICM systems, 
in addition to what is provided by geometrical shaping. These 
analytical conclusions are confirmed by numerical results, which 
also show that for a fixed input alphabet, probabilistic shaping 
of BICM can improve the mutual information in the low and 
medium SNR range over any coded modulation system with a 
uniform input distribution. 

Index Terms — Binary labeling, bit-interleaved coded modu- 
lation, Hadamard transform, mutual information, probabilistic 
shaping, Shannon limit, wideband regime. 



I. Introduction 

The most important breakthrough for coded modulation 
(CM) in fading channels came in 1992, when Zehavi intro- 
duced the so-called bit-interleaved coded modulation (BICM) 
U, usually referred to as a pragmatic approach for CM 
12, |3). Despite not being fully understood theoretically, 
BICM has been rapidly adopted in commercial systems such 
as wireless and wired broadband access networks, 3G/4G 
telephony, and digital video broadcasting, making it as the 
de facto standard for current telecommunications systems [3] 
Ch. 1]. 

Signal shaping refers to the use of non-equally spaced 
and/or non-equally likely symbols, i.e., geometrical shaping 
and probabilistic shaping, resp. Signal shaping has been 
studied during many years, cf. H, J5) and references therein. 
In the context of BICM, geometrical shaping was studied in 
E)-®, and probabilistic shaping, i.e., varying the probabil- 
ities of the bit streams, was first proposed in J9), iflOl and 
developed further in lfTTI - |fT3l . Probabilistic shaping offers 
another degree of freedom in the BICM design, which can 
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be used to make the discrete input distribution more similar to 
the optimal distribution (which is in general unknown). This 
is particularly advantageous at low and medium SNR. 

For the additive white Gaussian noise (AWGN) channel, 
the so-called Shannon Limit (SL) —1.59 dB represents the 
average bit energy-to-noise ratio needed to transmit informa- 
tion reliably when the signal-to-noise ratio (SNR) tends to 
zero 1141 . |fT31 . i.e., in the wideband regime. When discrete 
input alphabets are considered at the transmitter and a BICM 
decoder is used at the receiver, the SL is not always achieved 
as first noticed in fl6l . This was later shown to be caused 
by the selection of the binary labeling ifTTll . The behavior of 
BICM in the wideband regime was studied in |T6l - ||20l as a 
function of the alphabet (X) and the binary labeling (L), as- 
suming a uniform input distribution. First-order optimal (FOO) 
constellations were defined in l20l as the triplet [X, P, L] that 
make a BICM system achieve the SL, where P represents the 
input distribution. 

In this paper, we generalize the results of fl20l to nonuniform 
input distributions and give a complete characterization of 
FOO constellations for BICM in terms of [X, P, L]. More 
particularly, we find the geometrical and/or probabilistic shap- 
ing rules that should be applied to a constellation to make it 
FOO. The main conclusion is that probabilistic shaping offers 
no extra degrees of freedom in addition to what is provided 
by geometrical shaping for BICM systems in the wideband 
regime. At medium SNR, however, probabilistic shaping of 
BICM can improve the MI for a fixed input alphabet, which 
is shown numerically. 

This paper is organized as follows. The system model and 
the preliminaries are presented in Sec. [TTJ where a new discrete 
transform is also introduced. In Sec. [HI] new exact expressions 
are derived for the asymptotic behavior of the BICM-MI at 
low SNR. Based on these expressions, FOO constellations are 
characterized in Sec. |IV] for uniform and nonuniform input 
distributions. Conclusions are drawn in Sec. [V] 

II. Preliminaries 

A. Notation 

Bold italic letters x denote row vectors. Block letters X 
denote matrices or sometimes column vectors. The identity 
matrix is I. The inner product between two row vectors a 
and b is denoted by (a,b) and their element- wise product 
by a o b. The Euclidean norm of the vector a is denoted by 
||a||. Random variables are denoted by capital letters X and 
random vectors by boldface capital vectors X. The probability 
density function (pdf) of the random vector Y is denoted 
by Py(u) and the conditional pdf by Py\x(u\ x )- A similar 
notation applies to probability mass functions (pmf) of a 
random variable, which we denote by Py(u) and Py\x(u\ x )- 
Expectations are denoted by E. 
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Fig. 1 . A generic BICM system, consisting of a BICM transmitter, the channel, and a BICM receiver. 



The empty set is denoted by and the binary set by B = 
{0, 1}. The negation of a bit b is denoted by b = 1 — 6. Binary 
addition (exclusive-OR) of two bits a and b is denoted by 
a © b. The same notation a © b is used for the integer that 
results from taking the bitwise exclusive-or of two integers a 
and b. 

B. System Model 

We consider transmissions over a discrete-time memoryless 
vectorial fast fading channel. The received vector at any 
discrete time instant is 



Y = H o X + Z 



(1) 



where X is the channel input and Z is a Gaussian noise 
with zero mean and variance Nq/2 in each dimension. The 
channel is represented by the iV-dimensional vector H, and 
it contains real fading coefficients Hi which are assumed 
to be random variables, possibly dependent, with same pdf 
pn(h). We assume that H and iVn are perfectly known at the 
receiver or can be perfectly estimated. We assume the technical 
requirements on X and H in ll20l Sec. TD] are also satisfied. 
The conditional transition pdf of the channel in ([l} is given 

by 

\\y - hox\\ 2 



PY\X,H(y\x,h) 



1 



exp 



Nn 



The SNR is defined as 



(2) 



(3) 



where E B = E[||X|| 2 ] is the average transmitted symbol 
energy, R c is the transmission rate in information bits per 
symbol, and P£ = W>[H 2 ]E S /R C is the average received 
energy per information bit. 

The generic BICM scheme in Fig. Q] is considered. The 
transmitter is, in the simplest case, a single binary encoder 
concatenated with an interleaver and a memoryless mapper 
<&. Multiple encoders and/or interleavers may be needed to 
achieve probabilistic shaping ifTQl — lfT3l . At the receiver, using 
the channel output Y, the demapper <I> _1 computes metrics 
for the individual coded bits Ck with k = 0, . . . ,m— 1, usually 
in the form of logarithmic likelihood ratios. These metrics are 
then passed to the deinterleaver(s) and decoder(s) to obtain an 
estimate of the information bits. 

The mapper $ is defined via the input alphabet X = 
[xq, . . . , xJ I _ 1 ] T G R MxN , where m bits are used to index 
the symbols vectors x l G R N for i = 0, . . . , M - 1, M = 2 m . 



We associate with each symbol Xi the codeword (binary 
labeling) c; = [c^o, ■ ■ • , Ci, m -i] G B m and the probability 
< Pi < 1, where Pj = Px{xi). The binary labeling is 
denoted by L = [cj , ■ ■ ■ , cJ I _ 1 ] T G B Mxm and the input 



lAf 



distribution by P = [P , . . . , P M -i] 1 € [0, l] 1 

In the following, we define the labeling that we are going 
to use throughout this paper. This can be done without loss of 
generality, as wee will explain in Sec. III-CI 

Definition 1 (Natural binary code): The natural binary 
code (NBC) is the binary labeling N m = [n(0) T , . . . , n(M - 
1) T ] T , where n(i) = [n^o, • • • , n.,, m _i] € B m denotes the 
base-2 representation of the integer < i < M — 1, with 
n>i t m-i being the most significant bit. 

This definition of the NBC is different from the one in 
l20l . The different lies only in the bit ordering, i.e., in this 
paper we consider the last column of N m to contain the most 
significant bits of the base-2 representation of the integers i = 
0, 1, . . . , M — 1. It follows straightforwardly from Definition Q] 
that 



'2', k 



1, k = l, 
0, k^l, 

for all k = 0, . . . , m — 1 and I = 0, . . . , m — 1, and 



for all i = 0, 
1. 



M-l, j = 0,..., M-l, and k = 0, . 



(4) 
(5) 

. TO — 



C. Shaping in BICM 

Assuming independent, but possibly nonuniformly dis- 
tributed, bits Co, . . . ,C m -i at the input of the modulator 
(cf. Fig. Q]), the symbol probabilities are given by l20l eq. 
(30)] El eq. (8)] EH eq. (9)] 



Pi= n p c^,k) 



fc=0 

for i = 0, ...,M — 1, where Pc , fc (u) for u G £> is the 
probability of C fc = it. Since Pc fc (l) = 1 - P? fc (0), the dis- 
tribution P is fully specified by the vector of bit probabilities 
6 4[P Co (0),...,P Cm _ 1 (0)]. 

Throughout this paper, we assume that < Pc k (0) < 1 for 
all k = 0, . . . , m — 1; i.e., we assume that all constellation 
points are used with a nonzero probability. This can be 
done without loss of generality, because if Pc fc (0) = or 
Pc k (0) = 1 for some k, then half of the constellation points 
will never be transmitted. If this is the case, we remove the 
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corresponding branches in Fig. \T\ reduce m by one, and 
redefine the mapper <f> accordingly. The result is another 
BICM scheme with identical performance, which satisfies 
< P Ck (0) < 1 for all k. 

For any constellation [X, P, L], a set of equivalent constel- 
lations can be constructed by permuting the rows of X, L, and 
P, provided that the same permutation is applied to all three 
matrices. Specifically, it is possible to permute the rows of 
any labeling matrix L such that any other labeling is obtained. 
Without loss of generality, we therefore fix the labeling to be 
the NBC from now on. 

All results herein can be straightforwardly extended to an 
arbitrary labeling L by permuting the rows of X, P, and 
N m . Specifically, denote the permutation that maps the NBC 
into the desired labeling L by n, i.e., II(N m ) = L. The 
BICM system defined by the alphabet IT(X), the distribution 
II(P), and the labeling II(N m ) = L is entirely equivalent 
to the system with alphabet X, distribution P, and labeling 
N m . In particular, the two systems have the same BICM- 
MI. Without loss of generality, the analysis in this paper is 
therefore restricted to the latter case. 

Based on the previous discussion, from now on we use the 
name constellation to denote the pair [X,P], where the NBC 
labeling is implicit. Thus, L = N m and c^k = n^k for all i 
and k, which simplifies the analysis. Note that P cannot be 
chosen arbitrarily in BICM; only distributions that satisfy 



771—1 



Pi= n p c k M 



(6) 



fc=0 



for some vector of bit probabilities b will be considered 
in the paper. An important special case is the uniform dis- 
tribution, for which b = [1/2, . . . , 1/2] and P = U m = 
[1/M,...,1/M] T . 

D. The Hadamard Transform 

The Hadamard transform (HT), or Walsh-Hadamard trans- 
form, is a discrete, linear, orthogonal transform, whose coef- 
ficients take values in ±1. It is popular in image processing 
El and can also be used to analyze various aspects of binary 
labelings in digital communications and source coding l20l . 

ED-IB). 

Definition 2: The HT X = [x^ , . . . , xJ I _ 1 ] T of a matrix 
(or vector) X = [xq , . . . , xJ t _ 1 ] T with M = 2 m rows is 



- M-l 

Xi = ~j\f X jt l i,ji 
j=0 



0, 



,M- 1 



where for alii = 0, . . . , M - 1 and j = 0, . . . , M - 1 



IK- 1 )' 



(7) 



(8) 



k=0 



Because no, ft = for k = 0, . . . ,m — 1, setting i = in 
CTJl-® shows that the first HT vector 



^ M-l 

3=0 



(9) 



can be interpreted as the uniformly weighted mean of the 
alphabet. This is a property that the HT shares with, e.g., the 
discrete Fourier transform. 
It can be shown from ([8]) that 



M-l 



i=0 



M, j = l, 
0, j*l 



(10) 



for all j = 0, . . . , M - 1 and I = 0, . . . , M - 1. Therefore, the 
inverse transform is identical to the forward transform, apart 
from a scale factor: 

M-l 

Xj = X > h --J- .7 = 0,..., M - 1. (11) 

i=0 

E. A New Transform 

In this section, we define a linear transform between vectors 
or matrices, which depends on the input distribution P via the 
bit probabilities b. Its usage will become clear in Section lTlI-CI 

For all i = 0, . . . , M — 1 and j = 0, . . . , M — 1, we define 
the transform coefficients 

m— 1 
k=0 

(12) 

Note that they are nonsymmetric, in the sense that in general 
9i j 7^ 9j,i' m contrast to the HT coefficients hij, for which 
hi j = hj.i. The transform coefficients gij, however, have 
other appealing properties given by the following lemma, 
which will be used in the proofs of Theorems [3] |H and [H] 
Lemma 1: For any j = 0, . . . , M — 1 and I = 0, . . . , M — 1, 



M-l 



,0, j * I, 

i=0 



i=0 
M-l 



(13) 



(14) 



where Pj is given by © and hi^ is the Hadamard coefficient 
defined in ®. 

Proof: See the Appendix. □ 
We pay particular attention to two important special cases 
of <n~4t . First, if I = 0, then hij = hj.i = 1 and Pj S i = Pj 
for j = 0, ...,M — 1. Second, if I = 2 k for any integer 
k = 0, . . . , m — 1, then by (O, hi t i = hi t i = (-1)"* fc for any 
i = 0, . . . , M - 1 and by © 

tn—X 

p m = II p c k ( n j®2 k ,k>)- 

k'=0 

Using first (|5]l and then (0J, we obtain 



p m = (} I PcM,k 



p 



k'=0 
P C k (fl],k) 



P Ck {n ]M ®l) 
p c k (nj,k) 



°PcM,kY 

Substituting these two cases (I = and I = 2 h ) into (fT4l) 
proves the following corollary. 
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Corollary 2: For any j = 0, . . . , M — 1, 



M-l 

E 

i=0 



MJP, 



M-l 

E 

i=0 



p c k {nj,k) 



j,k) 



<P 



(15) 



(16) 



Example 1: If the bit probabilities are b = 
[0.35, 0.50], then the symbol probabilities (0 are 
P = [0.175, 0.325, 0.175, 0.325] T . The transform coefficients 
9ij in (1121 1 are the elements at row i, column j of 



1.977 
-0.304 





0.304 
1.977 




'Tin 






1.977 
-0.304 






0.304 
1.977 



(17) 



It is readily verified that G X G = MI, which is dot in 
matrix notation. The mean values in each column of ( fTTI i are 
[0.418, 0.570, 0.418, 0.570] T , which in agreement with (Q3) 
are the square roots of the elements in P. Similarly, it can be 
shown that G in (T7) satisfies (Q3) and (TTSt. 

The fact that the sums 2~2ito 1 9i,l9i,i xn < TT~3T > are zero 
whenever j 7^ /, independently of the input distribution, 
implies that the coefficients gij form an orthogonal basis. We 
define a linear transform onto this basis as follows. 

Definition 3: Given the bit probabilities b = 
[P Co (0), . . . , Pc m _! (0)], the transform X = [a£, . . . , xJ I _ 1 ] T 

[a£,...,a&-i] T with M = 2 m 



0,...,M-1 



(18) 



of a matrix (or vector) 
rows is 

M-l 

^« = Zy X j9i,j\/Pji 

with _Pj given by ©. 

Remark 1: For equally likely symbols, i.e., P = U m , the 
transform becomes the identity operation X = X, because then 
<7i i = vM for i = 1, . . . , M and = for i 7^ 7. 

The transform is invertible, as shown in the next theorem. 

Theorem 3: Given the bit probabilities b = 

[P Co (P),...,P Cm - 1 (P)]> the 



^ — [^0 ! ■ ■ ■ 1 X 

1 



T 

M- 



J M-1\ 

r is 



of 



inverse 
matrix 



transform 
(or vector) 



M-l 

^ ] x i9i,ji 
i=0 



j = 0,...,M-l. (19) 



Proof: For j = 0, 

M-l 

Z] X i9i,j 
i=0 



1. 



M-l M-l 
: Z Z ' 



t=0 
M-l 

E 



M-l 

E 



i=0 i=0 

Applying (13[ to the inner sum and dividing both sides by 
My/Pj, which by Sec. III-Bl is nonzero, completes the proof. 

□ 

Example 2: The Gray-labeled 4-PAM constellation [X, P] 
is considered, where X = [— 3,— 1,3,1] T and P is the 



HT 



t 



X 



Def.CD 
Th.| 

Th.0 



§ = X 

htI 



Fig. 2. Relations between transforms of transforms. 



same as in Example [TJ Rewriting (fT~ST > in matrix nota- 
tion, the transform can be calculated as X = GD 1 ' 2 X, 
where D = diag(P). With G given by ( fTTl ), the transform 
X = [-2.654,-0.746,2.654,0.746] is obtained. This is a 
nonequally spaced 4-PAM alphabet, which will be illustrated 
and analyzed in Example |4] The inverse transform ( fT9l ) can 
be written as X = (l/A/)©- 1 / 2 G T X. In the special case of a 
uniform distribution, G = D -1 / 2 = \/MI, which agrees with 
Remark [T| 

In Sec. IIV-BI we will need to apply the HT and the new 
transform after each other to the same alphabet. However, the 
two transforms do not commute, and the result will therefore 
depend on in which order the transforms are applied. Of 
particular interest for our analysis is the setup in Fig. [2] where 
X and § are related via the transform defined above. Their 
HTs X and § are however not related via the same transform. 
Instead, a relation between X and § can be established via the 
following theorem. 

Theorem 4: If § = X, then their HTs S and X are related 

as 

M-l m-l 



j=0 k=0 

«3, fe^"i,fc 



M-l _ m-l 
S 



i = 0,...,M -1, (20) 



;= E X (P Ck (n t .k) - PcM) 



i=Q Yb fc=0 



., W I (21) 



where 



Vi^ ; 2v/P Cfc (0)F Cfc (l), i = 0,...,M-l (22) 



fc=0 
n*,fe=l 

and a product over is defined as 1. 

Proof: See the Appendix. □ 

Remark 2: The summation in (l20t can be confined to 
because whenever j < i, there exists at least one 
bit position k for which n^jt 7^ Tij k = 0. Analogously, the 
summation in (f2lT> can be confined to X)i=7 1 - 

Example 3: The relation ( EOi l can be written as S = 
TX, which implies X = T _1 S. The element at 
row i, column j of T and T^ 1 are given by (EOt- 
<ED as, resp., ^ U k . ^ fe (P Cfe (0) - Pc k {n j>k )) and 
(iMOn^n^^C^fn,-*)'- PcM)- With b, X and 
X from Examples Q] and |2] we obtain [ipo, . . . , ^3] = 
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[1,0.954,1,0.954] and 



T = 



T" 



-0.300 



0.954 



0.315 
1.048 










-0.300 
0.954 









1 0.315 
1.048 



(23) 



(24) 



As predicted by Remark0 these matrices are upper triangular. 

Another relation between X and § can be deduced from 
Figure Defining the Hadamard matrix H as the matrix with 
elements hij for i, j = 0, . . . , M— 1, the HT relations (0 and 
CD) yield § = (1/M)HS and X = HX. Since from Example0 
§ = X = GD 1 / 2 X, we conclude that S = (l/A^HGB^HX, 
which implies that T = (l/i\/)HGD 1 / 2 H. Because H _1 = 
(1/M)H (see <Q(j)) and G" 1 = (1/M)G T , the inverse relation 
is T" 1 = (1/M 2 )HB- 1 / 2 G T H. It is straightforward to verify 
that T and T -1 calculated in this manner, using the numerical 
values of G and B in Examples Q] and indeed yield (123 1 - 

(ED. 

III. BICM at low SNR 

A. Mutual Information 

The mutual information (MI) in bits per channel use be- 
tween the random vectors X and Y for an arbitrary channel 
parameter H perfectly known at the receiver is defined as 

Py \x.h{Y\X,H)- 



I(X;Y\H) = E 



log 2 



Py\h(Y\H) 



(25) 



where the expectation is taken over the joint pdf px,Y-H, and 
Py\x,h is given by (0. 

The MI between X and Y conditioned on the value of the 
fcth bit at the input of the modulator is defined as 

p Y \x,H,c k (Y\X,H,C k ) 



I(X;Y\H,C k 



E 



log 2 



p Y \ H>Ch {Y\H,C k ) 



(26) 

where the expectation is taken over the joint pdf Px.Y.H,c k - 
Definition 4 (BICM Mutual Information): The BICM mu- 
tual information (BICM-MI) is defined as l2l. ifTBI. ifTTl. 



Hp) 



= E 

k=0 



I(C k ;Y\H) 



mI{X-Y\H) 



m— 1 

E 



I{X;Y\H,C k ) 



(27) 



where the second line follows by the chain rule. We will 
analyze the right-hand side of ( f27b as a function of p, for 
a given pdf P h- According to (0, p can be varied in two 
ways, either by varying No for a fixed constellation [X, P] or, 
equivalently, by rescaling the alphabet X linearly for fixed Ao 
and input distribution P. 

Martinez et al. fl26l recognized the BICM decoder in Fig. [TJ 
as a mismatched decoder and showed that the BICM-MI in 
(|27| | corresponds to an achievable rate of such a decoder. This 



means that reliable transmission using a BICM system at rate 
R c is possible if R c < I (p). Since from (0 E^/Nq = p/R c , 
the inequality R c < I(p) givefl 



for any p. Focusing on the wideband regime, i.e., asymptoti- 
cally low SNR, we make the following definition. 

Definition 5 (Low-MI Parameters): The low-MI parame- 
ters of a constellation [X, P] are defined as [fj,,E s ,a], where 



E[X] 

E[||X|| 2 
dl(p) 



dp 



In the wideband regime, the average bit energy-to-noise 
ratio needed for reliable transmission is, using (l28l i and the 
definition of a, lower-bounded by 

1 

>• Mm — '■ = 

A 



> lim 



(29) 



Furthermore, since in the wideband regime E^/Nq > log c 2 = 
-1.59 dB HI, a' 1 > -1.59 dB. 

The first-order behavior of the BICM-MI in $2J$ is fully 
determined by a, which, as we shall later (e.g., in d39l)), in 
turn depends on fi and E s . This is why we designate this triplet 
as low-MI parameters. The same definitions can be applied to 
other MI functions / (p) such as the coded modulation MI 
(CM-MI) l20l . In this paper, however, we are only interested 
in the BICM-MI. 

The main contributions of this paper are to characterize the 
low-MI parameters for arbitrary constellations, including those 
with nonuniform distributions (Sec. IHI-Cb . and to identify the 
set of constellations for BICM that maximize a, i.e., minimize 
E^/No in the wideband regime (Sec. IIV-BI ). 

B. Low-MI Parameters for Uniform Distributions 

The low-MI parameters [fi,E s ,a] have been analyzed in 
detail for arbitrary input alphabets X under the assumption of 
uniform probabilities ||20| . Under this assumption, they can be 
expressed as given by the following theorem. 

Theorem 5: For a constellation [X, U m ], the low-MI param- 
eters are 

. A/-1 

» = 77 E 



i=0 
M-l 



-E 



\Xi\ 



M 2 E« 



E 

k=0 



M-l 



(30) 
(31) 
(32) 



Proof: Expressions (130b and (l3lT i follow directly from 
Definition0 while ([32} was proved in J20] eq. (50)]. □ 



'The definition of the related function f(R c ) in 1201 eq. (37)] is erroneous 
and should read "E^/Nq is bounded from below by / (Rc) /^h[H 2 ], where 
f(R c )^C- 1 (R c )/R c ." 



6 



Preprint, March 1, 2012. 



The low-MI parameters can be conveniently expressed as 
functions of the HT of the alphabet X, as shown in the 
following theorem. 

Theorem 6: The low-MI parameters can be expressed as 



H = x , 

M-l 
i=0 



I X i 1 1 , 



, m — 1 

log 2 e \ - . 



X 2 h\ 



(33) 
(34) 

(35) 



fc=0 



Proof: The expression d33l is obtained from (0, (l34l 
from ES eq. (16)], and (03 from [H3 Theorem 11]. □ 

C. Low-MI Parameters for Nonuniform Distributions 

Theorem [5] can be generalized to arbitrary constellations 
[X, P] as follows. 

Theorem 7: For a constellation [X, P], the low-MI parame- 
ters are 

M-l 

A* = E PiXi > ( 36 ) 



i=0 
M-l 



5>i 



(37) 



i=0 



log 2 c 
E s 



M-l M-l 



-P; -Pj (Xi,Xj 



i=0 j=0 
m— 1 



fro ^K*)" 
Proof: Again, (l36l i and (l37l i follow from Definition [5] 
while (l38l l requires some analysis. It was shown in 
Theorem 10] that 

2 

P%X{ 



i m — 1 

log 2 e ^ 

fc=0 L 
M-l 



2E S 



M-l 



E 



2=0 
P%Xi 



2||HI 



(39) 



i=0 V p c fc («i,fe) 

Substituting (f36t and writing the squared norms as the inner 
products of two identical vectors yields 

, m-l M-l M-l 

s fc=o i=0 j=0 

-\/Pc fc K,fc)-Pc fc K-,fc) 
^ 1 

V-fc, {ni,k)Pc k (nj, fc ) 
The expression in brackets can be simplified as 



2 D _S„ \ i njh = n it k 



\/Pc k (nt. k )Pc k ( n 3,k) 



— 2(— \\ n i,k+nj,k 



Pc k {n it k) 



which completes the proof of ( T38T >. □ 
Theorem|7]shows that the low-MI parameters depend on the 
input alphabet X, the binary labeling (via n^k in the expression 
for a), and the input distribution (via Pc k {u) and Pi). 

While the low-MI parameters of an alphabet X with uniform 
probabilities is conveniently expressed in terms of its HT X 
(cf. Theorem|6]l, no similar expressions are known for the low- 
MI parameters of a general constellation in (T36l>-(r38]>. This has 
so far prevented the analytic optimization of such constella- 
tions. The new transform introduced in Section Hi-El however, 
solves this problem by establishing an equivalence between an 
arbitrary constellation, possibly with nonuniform probabilities, 
and another constellation with uniform probabilities. 

Theorem 8: The low-MI parameters [fi,E s ,a] of any con- 
stellation [X, P] are equal to the low-MI parameters of [X, U m ]. 

Proof: Let the low-MI parameters of [X, U m ] be denoted 
by [fj,',E s ',a']. First, <m and d yield 

M-l M-l 

= M £ ^ x i9i,i\fP~i 

i=0 j=0 
^ M-l M-l 

j=0 1=0 

Applying cfT~5T > to the inner sum in (l40l > reveals that 

M-l 

/*' = E P i x i 

3=0 
= V- 

Second, d3TT ) and ( TT~8T > yield 

M-l 
i=0 

^ A/-1 /M-l M-l \ 

= m E ( E x i9i,jVPj, E x i^,i^J~P~i) 
i=0 \ j=0 ;=o / 

_^ M-l M-l M-l 

= m E E E ftjffM- 

j=0 Z=0 i=0 

The inner sum is simplified using ( fT3l . which yields 

M-l 

b s ' = E p iii^'ii 2 

= £ s . 

For the third and last part of the theorem, we obtain from 
that 

2 



log 2 c 
M 2 E K 



m — 1 

E 

fe=0 



M-l 



E(-d 



n.,fe 



(41) 



where x,i is given by (TT~8T >. The inner sum can be expanded as 

M-l M-l M-l 

i=0 i=0 j=0 

M-l M-l 

= E^V^E(-!) 

j=0 i=0 



(42) 
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Fig. 3. 16-QAM constellation [XqaMi Pi] (black circles) with bit probabil- 
ities 6i = [0.35, 0.50, 0.35, 0.50]. Each symbol Xj is marked with its index 
j, and its probability Pa is proportional to the area of the corresponding circle. 
White circles represent the transformed constellation [Xqam,U4], which has 
the same low-MI parameters. 



Applying ( TT6b to the inner sum, we obtain 



Af-l 



Af-1 



i=0 



j=0 



' Pc k (n 3 ,k) 
Pc k ( n j,k)' 



D. Numerical Examples 

In this Section, we show examples of how the transform 
defined in Sec. III-EI works and we also present equivalent 
constellations [X, P] and [X, U m ]. All results are for the 
AWGN channel. The Mis are numerically evaluated using 
Gauss-Hermite quadratures following |27, Sec. III]. 

Example 4: Consider the equally spaced, Gray-labeled 
16-ary square quadrature amplitude modulation (16-QAM) 
alphabet X = Xqam with bit probabilities 61 = 
[0.35,0.5,0.35,0.5], shown with black circles in Fig. [3] The 
input distribution P = Pi given by © is Pq = P2 = Pg = 
P10 = 0.031, Pi = P 3 = Pa = P 6 = P 9 = Pu = P12 = 
P u = 0.057, and P 5 = P- = P u = P 15 = 0.106. These 
symbol probabilities are indicated in Fig. [3] where the area 
of the circle representing the symbol Xj is proportional to the 
corresponding probability Pj . 

Another alphabet Xqam is obtained by applying the trans- 
form in ([Tol l to the constellation [Xqam, Pi] - The white circles 
in Fig. |3] represent the symbols in Xqam using a uniform 
distribution U4. The alphabet Xqam is still a rectangular 16- 
QAM constellation, but a nonuniformly spaced one. Every 
row in Fig. [3] can be regarded as a probabilistically shaped 
(black) or a geometrically shaped (white) 4-PAM constellation; 
indeed, the same 4-PAM constellations as were studied in 
Example [2] 

The low-MI parameters of the two constellations 
[Xqam, Pi] and [Xqam,U4], given by Theorems [7] and 
[5] resp., are identical, as predicted by Theorem [8] These are 

H = 0, 
E s = 7.60, 
a = 1.10. (43) 



We take the inner product of this vector with itself and 
substitute the obtained expression for the squared norm in ( 1411 . 
This yields, after rearranging terms, 



. m-lM-lM-l 
s fc=0 i=0 j=0 



P P 



f_ 1 )n i , j! +n 3> / Pc k {ni,k) Pc k {nj,k) 
y ' V p c fc K*) Pc k (n jtk y 



The square root is Pc h {n i<k ) / 'Pc k (ni,k) if = or 
1 if rij^k = n>i,k- In both cases, it can be expressed as 
Pc k (fk,k)/Pc k {nj.k) (or, equivalently, P Ck {n hk )/Pc k K,fe)). 
Comparing this result with (l38T l verifies that a' = a. □ 

The result in Theorem HO shows that the constellation [X, P] 
can be mapped to the constellation [X,U m ], which has the 
same low-MI parameters as [X, P]. The new constellation 
[X, U m ] uses a uniform input distribution and its input alphabet 
X is related to X via (HD and (fl9t. 

To conclude this section, we summarize in Table U the low- 
MI parameters for BICM given by Theorems [6] and Q The 
equivalence of the parameters for [X, P] or [X, U m ] comes 
from Theorem [8] 



The BICM-MI for the constellations [Xqam, Pi] and 
[Xqam,U4] are shown in Fig. |4] In this figure, we also 
show the capacity of the AWGN channel and the CM-MI and 
BICM-MI for 16-QAM using a uniform input distribution, 
i.e., [Xqam,U 4 ]. The results show that the BICM-MIs of 
the original constellation [Xqam, Pi] and the transformed 
constellation [Xqam,U4] are in general different; however, 
they converge in the low-SNR regime. The endpoints of the 
BICM-MI curves for [Xqam, Pi] and [Xqam,U 4 ] are shown 
with a white circle, whose value follows from (l43l and d29i i. 
The endpoint for the BICM-MI curve for the constellation 
[Xqam,U4] is shown with a white square fl6l eq. (18)]. 

Example 5: Consider the NBC-labeled M-ary phase-shift 
keying (PSK) alphabet X = Xpsk, where Xj = [cos(27rj'/M+ 
7r/M),sin(27rj/M + ir/M)] with j = 0,...,M-1. The 
constellation for M = 8 for different input distributions 
P are shown in Fig. [5] where again the circle areas are 
proportional to the symbol probabilities Pj. We denote these 
input distributions by P2, P3, P4 and P5, which are generated 
by b 2 = [0.5, 0.6, 0.5], 63 = [0.5, 0.7, 0.9], 64 = [0.3, 0.7, 0.7], 
and 65 = [0.9,0.7,0.3], resp. The transforms Xpgx are 
irregular, not resembling a PSK alphabet. Nevertheless, the 
low-MI parameters for pairs of constellations [Xpsk, Pi] and 
[XpsK,Us] are again equal. Particularly, the parameters a = 



X 
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TABLE I 

LOW-MI PARAMETERS AND FOO CONDITIONS FOR BICM USING UNIFORM AND NONUNIFORM INPUT DISTRIBUTIONS. THE RESULTS FOR [ 
FROM (20l (CF. THE0REMS[6]AND[9} AND THE ONES FOR [X,P] OR [X, U m ] ARE FROM THEOREMS[7][8] ANd[TT] 

[X,U m ] [X,F] or [X,U m ] 

M-l 

/J, SBQ ^2 PiXi 

i=0 

M-l M-l 

e s ii* 1 ii 2 p *iNf 

i=0 i=0 
. m— 1 M — 1 M — 1 m— 1 p /- \ 

V- II5-„JI 2 i£&£ V P,- V P, (a,,.x,) V (-i\^+^ Fc ^ k > 



FOO Condition 



^ s fc~0 j=Q 3=0 fc=0 ^C fc (.nj,fcj 

Xj = /a = and ij = 

Vj {1, 2,4, ... , M/2} Vj £ {0} U {1, 2, 4, ... , M/2} 



0? 1.5 - 







■ / ■ *s 
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/ 's 
/ */ 






/*/ 
/// 

/*/ 
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Jr/ 
/'/' 














/ j' 
/ ft 




CM [X QAM ,U 4 ] 

BICM [Xqam,U 4 ] 

BICM [X QAM ,Pi] 

■ ■ ■ BICM [X QA m,U 4 ] 


/ I' 





-2 2 4 6 8 10 

E\JN [dB] 

Fig. 4. BICM-MI for the constellations [Xq AM , Pi] and [Xq AM , U 4 ]. The 
AWGN capacity C AW = log 2 (l + E B /N ) and the CM-MI and BICM-MI 
for uniform input distributions are also shown. The white circle and square 
show the value of {29\ with a given by <43t and by f20| eq. (55)], resp. 

cti for the four constellations [XpskjP*] are 

a 2 = 0.64, a 3 = 0.67, a 4 = 0.72, a 5 = 0.76. (44) 

The BICM-MI for three of the 8PSK constellations in 
Fig. is shown in Fig. [6] We also show the capacity of the 
AWGN channel and the CM-MI and the BICM-MI for the 
8-PSK alphabet using uniform input distributions. Again the 
results show that the BICM-MIs of the original constellation 
[XpsKjPj] an d the transformed constellation [Xpsk,^] are 
in general different but they converge in the low-SNR regime. 
The endpoints of the BICM-MI curves are obtained from ( l44l 
used in d29l i. except [Xpsk,U3], whose endpoint is given by 
Il20l eq. (56)] and <25). 

Example 6: Consider the eight-level star-shaped QAM al- 
phabet shown in Fig. [7] (black circles), which we denote 
by X 8 qam- This alphabet is used with bit probabilities 
^6 = [0.5,0.5,0.85], giving an input symbol probability Vq. 




El/N [dB] 

Fig. 6. BICM-MI for the constellations [X PS k,P,] with i = 3,4,5 
(cf. Fig.|5j and [X PSK , U 3 ]. The AWGN capacity C AW = log 2 (l + £; s /A r o) 
and the CM-MI and BICM-MI for uniform input distributions are also sh own. 
The white circles and square show the endpoint a^ 1 with a given by El) 
and by |20| eq. (56)], resp. 



In this figure we also show the transformed constellation 
[XgQAM, U3], which according to Theorem [8] has the same 
low-MI parameters as [Xsqam, Pe]- This can be appreciated in 
Fig. [8] where the corresponding BICM-MIs are shown. Fig. [8] 
also shows how probabilistic shaping improves the BICM-MI 
considerably over a wide range of SNRs. 

The results in Figs. |U|6] and [8] also show other interesting 
properties of probabilistic shaping for BICM. In the high-SNR 
regime, the use of a nonuniform distribution results in a loss in 
MI, i.e., the curves flatten out at a value below m [bit/symbol], 
but for a wide range of moderately high SNR, the BICM-MI is 
higher with probabilistic shaping. For the 16-QAM alphabet in 
Example |H in the medium SNR regime, the use of nonequally 
likely symbols even gives a larger MI than the one obtained 



Preprint, March 1, 2012. 



9 



1.5 



0.5 



0.5 



1.5 



1.5 



0.5 



0.5 



1.5 





















•< \ / 

^ \ / ^ 







4 
Q 


„ * 

— , -Si t 

^ / \ >• 





























-1.5 -1 



-0.5 0.5 

(a) P 2 



I 1.5 




\ 

x 

- =b 



1 1.5 



1.5 ■ 



0.5 



0.5 



1.5 



1.5 ■ 



0.5 



1 
-P 



2 
I 



N 1 / 



-1.5 -1 



-0.5 

(b) P; 



0.5 1 



4 



05 ^ _ x i>' 

\ 5 x 



i \ 
i \ 
i \ 



o ^ 



1.5 



-1.5 -1 



6: 

XT 



-0.5 0.5 

(d) P 5 



1.5 



1 1.5 
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by CM and a uniform distribution. 

IV. First-order Optimal Constellations 

Having characterized the low-SNR behavior of the BICM- 
MI of an arbitrary constellation, the next step is to search for 
optimal constellations in terms of the BICM-MI at low SNR. 
The following definition formally defines BICM systems that 
achieve the SL. 

Definition 6 (FOO constellation): The constellation [X, P] 
is said to be first-order optimal (FOO) if a BICM system using 
[X,P] achieves the SL -1.59 dB, i.e., a = log 2 e. 

Note that although the definition of FOO constellations 
seems to depend only on a, all the three low-MI parameters 
influence the first-order asymptotics. This can be seen from 
Theorem |6l where a depends on E s , and E s depends on fi. 



As discussed in Sec. III-BI we have fixed the labeling to be 
the NBC, and thus, an FOO constellation is fully characterized 
by only two parameters, the input alphabet X and its input 
distribution P, where the latter satisfies ©. The analysis can 
be straightforwardly generalized to an arbitrary labeling by 
permuting the constellation, see Sec. III-BI 

A. FOO Constellations for Uniform Distributions 

In this section we review results on FOO constellations for 
BICM for uniform input distributions. The next theorem gives 
necessary and sufficient conditions for an input alphabet to be 
FOO if the binary labeling is the NBC and input distribution 
is uniform. 

Theorem 9: The constellation [X, U m ] is FOO if and only 
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Fig. 7. Star-shaped 8-QAM constellation [XgQAMiPs] (black circles) with 
bit probabilities 66 = [0.5, 0.5, 0.85]. White circles represent the transformed 
constellation [XgQAMjUs], which has the same low-MI parameters. 
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Fig. 8. BICM-MI for the constellations [X 8 q A m,U3], [Xsqam,^] and 
[X 8 qam,U 3 ]. The AWGN capacity C* AW = log 2 (l + E s /N ) is also 
shown. The white circle and square show the endpoints a~ 1 with a = 1.14 
and 1.18. 



Vj£{l,2,4,...,M/2}. (45) 
Proof: From (1341 and (T35l l. we obtain a = log 2 e if and 



only if 



M-l 

Ei 

i=0 



E 

k=0 



□ 



which gives (f43T >. 

This theorem was given in l20l Theorem 12], where it was 
used to find FOO constellations for BICM when P = U m . 
It offers an appealing intuitive geometrical interpretation: An 
input alphabet is FOO for a uniform input distribution if and 
only if it is a linear projection of a zero-mean hypercube. This 
behavior is illustrated in Example [7] and also in l20l Fig. 4]. 

In the following section, Theorem [9] is generalized to 
nonuniform input distributions, which has not been done 
before. 

B. FOO Constellations for Nonuniform Distributions 

In this section, we derive necessary and sufficient conditions 
for a BICM system, with an arbitrary input alphabet and 
probability distribution, to achieve the SL, i.e., we find FOO 
constellations for BICM. The conditions are derived by trans- 
forming an arbitrary constellation into another constellation 
with uniform probabilities using Theorem [8] and applying 
Theorem [9] to this transformed constellation. Since Theorem [9] 
is expressed in terms of the HT, a relation between the HTs of 
X and X is needed, which is illustrated by the bottom arrow 
in Fig. |2 Such a relation is provided by Theorem [4] and will 
be applied in the proofs of Theorems [10] and QT| 

Theorem 10: The constellation [X, P] is FOO if and only if 
the HT X of X satisfies both the following conditions: 



m— 1 
k=0 

xj = 0, 



Vj£{0}U{l,2,4,...,A//2}. 



(46) 
(47) 



Proof: We will prove the theorem in two steps. First, 
we prove the "if" part by showing that d46l)-(l47l) implies that 
[X, P] is FOO. Second, we prove the "only if" part by showing 
that if [X, P] is FOO, then ©-03 hold. 

For the "if" part, suppose that <f4~6b— d47b hold for a given 
constellation [X,P]. Applying (g7]> in $2$ yields for the HT 
§ of § = X 



x (PoM - p Ck M) 



k=0 



m — 1 



m — 1 



J2*2< {P Ck (0)-Pc k (n»,k)) 



1=0 fc=0 

j = 0,...,M-1. 



(48) 



Since the bits n .fe = for k = 0, . . . , m — 1, the first product 
in (|48| ) is always zero, except when j = 0. (Recall that a 
product over in ( f20b was defined as 1 .) Furthermore, because 
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of the second product in d48l i is zero whenever j ^ {0, 2 1 } 
for some integer I. We can therefore identify three cases for 
d48l . First, 



§,=0, jg{0}U{l,2,. 



■>m — 1 



}• 



(49) 



Second, j = yields 



So = ^o 



/=o 



= (50) 
because of d461 l. And third, letting j = 2 l for an integer i, 

m— 1 m— 1 

«2' = X! (°) ~ Pc * ( n 2',fe)) i 



i=0 



fc=0 



i = 0,...,m-l. (51) 

When Z ^ i, the product in ( BTT l includes two factors, fc = Z 
and k = i. For one of them, k = i, n 2 i & = and the whole 
product is 0. Therefore, only Z = i contributes to the sum in 
( IBH . When Z = i, the product in ( BTT l is again over and ( IBH 
becomes 

S 2 > = ^2**2*- (52) 

Combining the three cases ( |49l , (l50l , and ( 1521 yields 



0. 



otherwise. 



Using Theorem [9] we conclude that the constellation [X,U m ] 
is FOO. Finally, Theorem implies that [X, P] is also FOO, 
which completes the proof of the "if" part. 

For the "only if" part, assume that [X,P] is FOO. By 
Theorem [8] [X,U m ] is also FOO, and by Theorem s t = 
for any i that is not a power of two, where S is the HT of 
§ = X. We will now use Theorem |4] to translate the condition 
on § into conditions on X. 

If Si = for i g {1, 2, . . . , 2 m ~ 1 }, then the summation 
over i = 0, . . . , M — 1 in (BTT l can be reduced to a summation 
. , m — 1, 

m — 1 



over i = 2 l for Z = 0, 



m— 1 ~ 

Snl 



^ = Ex7 II {Pc k {n», k )-PcM), 

i=0 k=0 

n 3,k¥ zn 2 l k 



j=0,...,M-l. (53) 

Due to dUi, the product in d53l l can be nonzero only if the 
product is over k € or over the single-element set k <E {I}, 
i.e., if j = or j = 2 l for some integer Z, resp. Again, three 
cases can be identified. First, if j <£ {0} U {1, 2, . . . , 2™- 1 }, 
then the product in ( 1531 includes at least one k ^ I for every 
I. Hence 



Second, for j 
k = Z, and 



,=0, j<£{0}U{l,2,...,2 m - 1 }. (54) 
0, the product comprises only one factor, 



1=0 



(55) 



And third, setting j = 2\ 



E 

1=0 



V>2< 



fe=0 



0,...,m-l. (56) 



This product is similar to the product in dBTI i, and, as explained 
after d5H . it is 1 if Z = t (product over 0) and otherwise. 
Thus, the summation in ( TSol can be reduced to just one term, 

Z = i: 



S 2 i 



0,...,m-l. (57) 
Combining the three cases d54l . 031 . and d57l yields 

Em— 1 ~ 
(=0 S 2< 



«n-l - Pc i (l)--Pc i (0) 



i = o, 

J€{l,2,..,2 m - !} 
otherwise 



which satisfies d46b— d47b . This completes the proof of the 
"only if" part. □ 
Remark 3: Only d46l depends on the input distribution, not 
(|47| >. In view of Theorem|9] the only difference between FOO 
constellations with uniform and nonuniform distributions lie 
in xq. The final theorem gives this statement a more intuitive 
interpretation. 

Theorem 11: The constellation [X, P] is FOO if and only if 
both the following conditions hold: 



H = 
x, = 0. 



Vj£{0}U{l,2,4,...,M/2}. 



(58) 
(59) 



Proof: We wish to prove that if d47| i (or equivalently ( 1591 ) 
holds, then d46l and d58l are equivalent. 

For any constellation [X,P], the mean /j, = Sq, where S is 
the HT of § = X. This follows from Theorem[8]and ([33]>. The 
mean can be calculated by letting i = in d20l and d22l as 



M-l 



M =vo^^ ; ; (p Ch (o) - p Ck (i)) 



(60) 



i=0 



fc=0 
^i.k = l 



where t/i = 1. 

In this theorem, we are only interested in constellations that 
satisfy ( f47T > or equivalently ( 1591 . For such constellations, the 
sum in d60l includes at most m + 1 nonzero terms, namely, 

m—l 

H = xo+J2^ (Per, (0)- Per, (1)). 

This expression makes d46b and d58l equivalent. □ 
Based on Theorem |9] the result in Theorem [TT] can be 
understood as follows. If a constellation with a uniform input 
distribution is FOO, it will still be FOO for any other input 
distribution b provided that the input alphabet is translated 
to be zero mean. In view of the geometrical interpretation 
of Theorem [9] given in l20l Theorem 12], the result in 
Theorem QT] also states that a constellation is FOO if and only 
if its input alphabet is a linear projection of a hypercube and 
it has zero mean. 



12 
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We also note that the zero-mean condition in Theorem [TT] 
is the same that guarantees FOO for the CM-MI |20] Foot- 
note 120. This implies that the only difference between FOO 
constellations for the CM-MI and the BICM-MI lies on the 
extra constraint on the input alphabet to be a linear projection 
of a hypercube. 

C. Numerical Examples 

In this subsection we give numerical examples to illustrate 
the analytical results presented in this paper. 

Example 7: Consider the so-called OTTO (one-three-three- 
one) alphabet in ll20l Fig. 4 (a)], which corresponds to a 
projected hypercube. This alphabet is 



^OTTO 



-1 1 











-1 



(61) 



The constellation [Xotto > U3] was shown to be FOO in |20l 
Example 4]. 

In this example, we are interested in the first-order behavior 
of the constellation [XottOjP] f° r different P. In view of 
TheoremQTl this constellation will be FOO if it has zero mean. 
Using d6"TT l and (O in d36l l, we find (after some algebra) that 



M = 



2(P Cl (0)-P Co (0)-P 02 (0)) 
2(P Co (0)-P C2 (0)) 



(62) 



For example, the bit probabilities bj = [0.40,0.55,0.60] 
give an input distribution P7 for which the mean d62l is 
H = [0.10,-0.40]. We define another alphabet X OTTO by 
subtracting fi from each element in Xotto- The translated 
constellation [X OTTO ,P7] is shown in Fig. [9] along with 
[Xq TTO , U3], where X OTTO is the transform of X OTTO 
for the distribution P7. They are both zero-mean projected 



2 The parameter a for the CM-MI is 1201 Theorem 7] a = log 2 
\M\ 2 /E S ). 



c(l 



hypercubes and thus FOO according to Theorem [TTJ This can 
be observed in Fig. [TTJ where the BICM-MI for [X OTTO , P 7 ] 
is shown. 

The exemplified method holds in full generality: Any alpha- 
bet that is FOO with a uniform input distribution is FOO also 
with an arbitrary nonuniform distribution, if it is translated to 
zero mean. Furthermore, all nonuniform FOO constellations 
can be constructed in this manner. 

For certain distributions, the mean d62b is zero without 
translation. Specifically, fi = if and only if 



Pc (0) = Pc 2 (0) = P Cl (0)/2 + l/4. 



(63) 



Clearly, the uniform case (P Co (0) = P Cl (0) = P c . 2 (0) = 1 /2) 
analyzed in l20l fulfills d63l ). More interestingly, when any 
other vector of bit probabilities fulfilling d63l ) is used, the 
resulting constellation will be FOO. This is the case for in- 
stance with b 8 = [0.70, 0.90, 0.70]. The obtained constellation, 
denoted by [Xotto, Ps] is illustrated in Fig. [TOl along with 
its transform [Xotto, U3]. Graphically, both alphabets look 
like cubes, although viewed from different angles, which is 
precisely what Theorems [9] and [TTJ predict. 

In Fig. [TTJ we show the BICM-MI for the zero-mean 
constellations [X' OTTO , P7] and [Xotto , IPs] as we H as f° r the 
constellations [X OTTO , U3] and [Xotto , U3] ■ As expected, all 
the MI converge at the SL for low SNR. For these two cases, 
the transformed alphabets with uniform input distributions give 
larger MI for all SNRs compared to the corresponding nonuni- 
form ones. This is however not always the case, cf. Fig. [6] with 
P 3 - 

Example 8: M-ary pulse amplitude modulation (PAM) al- 
phabets have been shown to be FOO if the NBC is used with 
a uniform input distribution, i.e, the constellation [XpAM,Um] 
with Xpam = [-(M - 1), — (M - 3), . . . , M - 1] is FOO 
ifm ||20l Theorem 14]. In this example, we study the first- 
order behavior of the Gray-labeled 8-PAM alphabet X pAM = 
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Ei/N [dB] 

Fig. 11. BICM-MI for the four FOO constellations in Figs. [9] and [TO] The 
SL is shown with a white circle. 



[— 7, 7, — 1, 1, — 5, 5, — 3, 3]. The constellation [Xp AM ,P] is 
known not to be FOO for U3 |[T6l Theorem 3]. 

The BICM-MI of X^ AM is shown in Fig. [II] for the set of 
bit probabilities fog = [0.5, p,p] for different values of p. For 
p = 0.5, the uniform distribution is obtained. As p decreases, 
the Gray-labeled constellation approaches a zero-mean binary 
alphabet, which is FOO E3 Fig. 2] |20| Fig. 3 (b)]. The results 
in Fig. [12] show the tradeoff between the low- and high-SNR 
regimes: The SL can be approached by decreasing p, but this 
causes a decrease in MI in the high-SNR regime. Alternatively, 
the SL can be attained by switching from the Gray code to 
the NBC, but this also comes with a heavy penalty at higher 
SNRs. 




-2 2 4 6 8 10 12 

E\JN [dB] 



Fig. 12. The BICM-MI for a Gray-labeled 8-PAM alphabet X^ AM with 
uniform input distribution and bit probabilities 6g = [0.5, p,p] for different 
values of p. The constellations approach a binary FOO constellation as p — > 0. 
The NBC-labeled 8-PAM alphabet X PAM , although FOO with a uniform 
distribution, is considerably weaker than Xp AM for a wide range of SNRs. 



Appendix 
Properties of the Transform 



In this Appendix, some theoretical properties of the new 
transform defined in Section III-EI are proved. In the first 
section, a ubiquitous lemma is established, the "sum-product 
lemma," which will be used extensively throughout the ap- 
pendix. In the following two sections, Lemma [T] and Theo- 
rem |4] are proved, resp. 



V. Conclusions 

There exists a closed-form mapping between any probabilis- 
tically shaped constellation and a constellation with uniform 
input distribution, such that the two systems have the same 
low-SNR BICM-MI (Definition |3] and Theorem [§). Thus, 
the combination of probabilistic and geometric shaping is 
equivalent to pure geometric shaping at low SNR. 

We are particularly interested in BICM systems that attain 
the SL -1.59 dB at asymptotically low SNR, i.e., in FOO 
constellations. Somewhat disappointingly, the set of proba- 
bilistically shaped FOO constellation is no larger than the set 
of FOO constellations with uniform distributions, disregarding 
translations of the whole input alphabet. Both sets can be fully 
characterized as the set of linear projections of a hypercube, 
translated to have zero mean for the considered input distribu- 
tion (Theorems |9l and [TOl cf. Figs. |9land[T0li. Although non- 
FOO constellations for BICM can be improved by probabilistic 
shaping (Fig. IT2t. it is impossible to make them FOO except in 
degenerate cases (by setting some probabilities equal to zero). 



A. The Sum-Product Lemma 

Many properties of NBC-labeled constellations and their 
transforms can be expressed as the sum of products of certain 
functions, where each function depends on one bit position. 
Such expressions, which occur frequently in the two subse- 
quent sections, can be resolved using this general lemma. 

Lemma 12: Let f^ u for k = 0, . . . , m — 1 and u G B be 
any real numbers. Then 

M—lm—l m— 1 

i=0 fe=0 k=0 

Proof: A summation over i = 0, . . . , 2 m — 1 is equivalent 
to m sums over ik G B, where k = 0, . . . , m — 1 and i = 
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io + 2ii + ■ ■ ■ + 2 m 1 i m -i- With this notation, m k = ik an d 



M— 1 m—1 to— 1 

e n "EE"' e n /*.*«, 

i=0 fc=0 i eSiiSS » TO -i6B fc=0 

= ^ ] ^ ] ' ' ' ^ ] fo.iafl.ii ' ' ' fm—l,i m 

= fo,i a E Z 1 '*! " ' E /™-Mm-i 

m—1 

= n e ■ 

fe=0 ifcSH 



□ 



5. 77ze Transform Coefficients 

Lemma Q] which lists two fundamental properties of the 
transform coefficients g^j, was given in Section Ul-EI 

Proof of Lemma Q} The two parts of Lemma Q] will now 
be proved separately. First, the definition of gi j in dT~2T > yields 



A/-1 M-Ito-1 

E 9i,l9i,j 

i=0 i=0 k=Q 



vi — i m—l 

^ n [(-it^vpcW) 



(-lf^'v^J) 



A/-1 m-1 

]T J] [(-l)^- fe( "'' fe+ "^ ) Pa fc (0) 



i=0 fc=0 



+ ^P Ch (0)P Ck (1) 

+ (-1)"^ y'p^ (0)P Cfc (1) 

+ (_l)»i.*C««,*+«i.*)p Cjb (l) 

M-1 m-1 

J^J ^_]^«»,fc(™I,k+«,7,fc) 

i=0 fc=0 

. [(_i r .*+^P Cfc (0) + P Cfc (l) 



+ [(-!)»«.* + (-1)"^] y/P Ck (0)Pc k M 



where the last equality follows by repeatedly using the iden- 
tities u = 1 - u and = (-1)"" for u e B. LemmaQl] 



is applicable in this case too and yields 

A/-1 m-1 



E II [(-ir ! ' fc+ " 3 ' fc ^c fc (o) + p Cfc (i) 



i=0 



fc=0 



+ [(-!)"'•* + (-l)'" J ' fc ] ^c fc (0)Pc fc (l) 

+ ((-l)"^+' l ^ fc P Cfc (0) + P Cfc (l) 

- [(-I)"'.* + (-1)"^] y/PcMPcM) 

!-l 



k=0 



+ (l + (-l)™ ! .*+"^)p cfe (l) 



-1)"' 



V^c t (0)P Cfc (l) 



[ (l + (-1)™'.*+"^) 



(64) 



fe=0 



where the last step follows because Pc k {0) + Pc fc (l) = L 
The factors in d64l i are either 2 or 0, depending on whether 
n i,k = njjc or ni,k ^ rij t k for the particular bit position k. 
Thus, Yli=o 9i,l9i,j i s either 2™ = M or 0, depending on 
whether I and j have all bits equal or not. This completes the 
proof of ( fT3l ). 

To prove the second part of Lemma Q] which is dT4b . we 
observe from ([8]l and (fT2l that 



M-1 



M-1 m-1 



e = e n^ 1 )" 1 ^ 

i=0 i=0 fc=0 

m — 1 



fc=0 



A/-1 m-1 

e n ^ 

i=0 fe=0 



(65) 



where 



cf> k ,u = (-i) uni ' k+ ^ k VP^M 



Intending to apply Lemma [T2] to 
quantity 



+ (-l) mi ' M ^VP^. 

we first calculate the 



+ (-ir fc v^jo) + (-i) n, - t+fi - fc A/^(i) 

for fc = 0, . . . , m — 1. For reasons that will soon become 
clear, we extract a common factor (— l) n j.fc™i.fc from all terms, 
obtaining 

0fe.o + 0fc.i - (-!)*'■*"'■* 



+ ^p Cfc (0) 



+ + ^P Cfc (l) 



The coefficient in front of ^/ Pc fc (0) is 2 if ra^fc = rt; ; fc and 
otherwise. Similarly, the coefficient in front of \fPcJX) 
is if rtj fc = ni t k and 2 otherwise. Thus, for every k, the 
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expression depends on either Pc h (0) or Pc k (1) but not both, 
namely, 



9fc,0 + <Pk,l 



(-l)W.* ■ 2/^(1), n jtk + n Lk 

= 2(-l)W-*yJp Ck (n j ,k®ri l , k ) 

= 2(-lT^ n >-^P Ck (n m , k ) (66) 

according to (0. 

Using d66l . we are now ready to apply Lemma Q~2] to ([65), 
which yields 



M-l 



m—l 



E h ^9i,j = II 2(-l)"- fc "^JPc fc (^, fc ) 



fe=0 



Applying (O and dHJ, we obtain finally 



M-l 



i=0 



k =0 



fc=0 



which completes the proof of ( TBI . 



□ 



C. Two Consecutive Transforms 

Theorem [4] which characterizes the vectors obtained by 
applying the new transform and the HT sequentially to a given 
alphabet, was given in Section Hl-EI 

Proof of Theorem^} To prove (120b . we first write § as a 
function of X using (01 and ( TT~8T >. as illustrated in Fig. [2] For 
i = 0, . . . , M - 1, 



M-l 



— y sjhn 

M ^ 3 J 

^ M-l M-l 

^ E E x m,i\fPi 

3=0 1=0 
^ M-l M-l 

— e ^v^E 



M 

M-l 



1=0 

using (TBI for the last equality. To obtain § as a function of 
X, we express xi in terms of its inverse HT in dTTb . to obtain 

M-l M-l 

Si = E h i,iV p i p m E 

1=0 j=0 
M-l 

= E (67) 

3=0 



where 



The coefficient rjij can be expressed using ([8]l and © as 

M-l m-l 

^TJ (_iy l l,k(n iik +n jik ) 



Vi,3 



1=0 k =0 



■ J Pc k (ni, k )Pc k (n i)k 8 m ik ) 

m—l r 

II ^ P cM P C k {^k) 



k =0 



+ (-l) n " k+n ^^JPc k (l)Pc k (ni,k) 



(68) 



(69) 



where to pass from (168t to $6% we used Lemma [12] with 

/M = (-l)»( n *.*+"i.*) (n<, fc © u). Depending 

on the value of n.^, this product can be partitioned into two 
subproducts 



n [p Cfe (o)+(-ir^p Cfe (i)] 

k=0 

n iyk =0 

71-1 

II [^c fc (0)P Cfc (l) + (-lf-W p c k (l) p c k (0) 

k=0 

i,*=l 

m — l 

^ n [Pc,(o) + (-ir^p Cfe (i)] 

— 1)%,* 



«;.k = l 



n 

fe=0 
"l.fe=l 



(70) 



making use of ^ defined in ( f22b . Since the two factors in ( |70| ) 
depend on nj,fc as 

Pc fe (0) + (-l)^Pc fe (l) = ( 1 p 

I Pc fc (0) - P Cfe (l), rij, k = l, 



1 + (-1)^.- 



0, nj,*. = 0, 

1. nj, k = l, 



they can be expanded into four factors as 

%J = 



m—l 

n 1 


m—l 

n ^ 




k =0 




rij,fe=l 


m—l 

n ° 


m — l 

n 1 


k=0 
«i,fc = l 


fc=0 
n«,fe=l 


™j,fc =0 





(71) 



The two products of ones can be omitted. The product of 
zeros may look strange but is perfectly legitimate; its value is 
by definition 1 if the set {k : [n^k, nj yk ] = [1,0]} is empty 
and otherwise. Furthermore, since Pc k (0) ~ Pc k ( n j,k) = 
for all k in this set, the second and third factors of ( TTTb can 
be merged into 



M-l 

^.J ~ E h l,i h j,lV P l P U 
1=0 



Vi,j 



ft (PcM - P Ck {n hk )) (72) 



k =o 
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which together with (|67l ) completes the proof of ( f2Qb . 

To prove the reverse relationship (f2Tb . we first multiply both 
sides of doTt with hi t M-iVl,i/(' l Pi' l l ) l) an< ^ sum over * : 

M-l , Af-l M-l , 



i=0 T T j=0 i=0 

The Hadamard coefficient h^M-x can be factorized using (O 
with tim -l it = l,Vfc as 



,M-1 



ik-d 



(74) 



Using d74l in d73l i and replacing the ratios r]ij/ipi and r/i^/ipi 
with their factorizations according to d72l , we obtain for the 
inner sum on the right-hand side of (l73l l 



M— 1 M-l m-l 

i=0 ^ r i=0 fc=0 



(75) 



where 

(-l)'"(P Cfc (0) - P Cfc (u)), u = fc ^ n ltk , 

(-l)«(P Cfc (0) - P Cfe (n i)fc )), u = n hk ? n jlk , 
(-l) u (P Ck (0) - Pc k (n 3 ,k))(P Ck (0) - Pc k {u)), 

u ^ n %k = n Lk . 

The fourth case is always 0, because either Pc k (0)—Pc k {nj, k ) 
or Pc k (0) — Pc k (u) is 0. Furthermore, the second and third 
cases can be combined into 

u = n jtk = ni >k , 
(-l)«(P Cfe (0) - Pc.Kfc)), n jtk ^ TH, k , 
0, « / n hk = ni >k . 

(76) 

We wish to apply Lemma [T2l to the right-hand side of ( 1751 . 
In order to do so, we first calculate from (l76l 

1 + 0, n,j tk = n Lk = 0, 

(Pc fc (o)-p Cfe K- fe )) 

-(Pc h (0)-Pc fc (ni,fc)), n Jtk ?ni, k , 
. 1 , itj./o 

[ ■ 1 :■" ' • n jtk = ni ik , 
\0, n jyk ^n Lk . 

Now by Lemma [12] (fTBI l can be expressed as 



: nz.fc = 1 

(77) 



M-l m-l 

»v^^ II 



»fc,0 



By ( l77l . this product will be nonzero only if j and I match in 
all bit positions k = 0, . . . , m — 1, i.e., if j = /. Thus, again 
utilizing (1741 . 



i=0 



i,M-l 



V'i V>z [0, 

| h LM -i, j = I, 
= I 0, .? ^ L 



(78) 



Finally, we combine d73l ) and ( T78l into 

M-l , 

E~ n>i,M-lVl,i , 
Si —, = hi M-lXl. 



(79) 



Dividing both sides by hi t M-i yields on the left-hand side 
the coefficent hi.M-iVi-i/ (h^M-iipi), which can be expressed 
using dSl> and ( 1721 as 



hj,M-im,i 



m — 1 



IK- 1 ) 7 



(p Cfc (o)-p c ,K fc )) 



fe=0 



(-i)(p Cfc (o)-p Cfc K fe )) 



fe=0 

This expression, substituted into d79l , completes the proof of 
(ED. □ 
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